Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1] Add sourceHost and destHost to flow metrics (with improvements) #1854

Conversation

Karen-Schoener
Copy link
Collaborator

@Karen-Schoener Karen-Schoener commented Dec 17, 2024

Fixed an issue with PR #1824.

Tested skupper console with this fix.

Tested with external vm-client and external vm-server.
Verified that in/out metrics displayed as expected in the skupper console.

Tested with hello-world frontend/backend deployments.
Verified that in/out metrics displayed as expected in the skupper console.

This change adds sourceHost and destHost to the flow metrics produced by the flow collector. It acomplishes this by adding those labels to the promethus metrics, and changing the aggregation key for metric aggreagation in the flow collector.

This approach may change to one that does not require new labels on the prometheus metric. To be discussed.
Fix to populate destHost label correctly in revLabels
@Karen-Schoener Karen-Schoener marked this pull request as draft December 17, 2024 21:34
@Karen-Schoener Karen-Schoener self-assigned this Dec 17, 2024
@Karen-Schoener
Copy link
Collaborator Author

Hi @ajssmith @c-kruse When you have a minute, I had a question on the 2 minute wait that I noticed in needForSiteProcess.
https://github.com/skupperproject/skupper/blob/main/pkg/flow/flow_mem_driver.go#L2799

I was just curious why a 2 minute wait is needed here...

Btw, as a next step, I'll work on an approach that does not add new labels.

Thanks, Karen

@Karen-Schoener Karen-Schoener changed the title Add sourceHost and destHost to flow metrics (with improvements) [v1] Add sourceHost and destHost to flow metrics (with improvements) Dec 17, 2024
@Karen-Schoener Karen-Schoener changed the base branch from main to 1.8 December 17, 2024 22:02
@Karen-Schoener Karen-Schoener changed the base branch from 1.8 to main December 17, 2024 22:04
@c-kruse
Copy link
Contributor

c-kruse commented Dec 18, 2024

@Karen-Schoener, Andy probably has a better answer for you here, but I believe that delay is intended to guard against the collector prematurely associating flows with a site-process or site-client when a new process record could be en-route from a controller.

One hundred and twenty seconds is probably a bit much, but the collector very much receives vanflow messages in wild orders (especially from different event sources - like is the case with ProcessRecords and FlowRecords which come from the "Controller" part of pkg/flow and FlowRecords that come from the skupper router.) Because these site server and site client processes hang around (don't tend to get cleaned up until the site goes away), and because flow pairs get permanently associated with the process, we were trying to be careful to avoid this.

@Karen-Schoener
Copy link
Collaborator Author

Closing in favor of PR #1857

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants